Unsupervised Prediction of Acceptability Judgements

نویسندگان

  • Jey Han Lau
  • Alexander Clark
  • Shalom Lappin
چکیده

In this paper we present the task of unsupervised prediction of speakers’ acceptability judgements. We use a test set generated from the British National Corpus (BNC) containing both grammatical sentences and sentences containing a variety of syntactic infelicities introduced by round trip machine translation. This set was annotated for acceptability judgements through crowd sourcing. We trained a variety of unsupervised language models on the original BNC, and tested them to see the extent to which they could predict mean speakers’ judgements on the test set. To map probability to acceptability, we experimented with several normalisation functions to neutralise the effects of sentence length and word frequencies. We found encouraging results with the unsupervised models predicting acceptability across two different datasets. Our methodology is highly portable to other domains and languages, and the approach has potential implications for the representation and the acquisition of linguistic knowledge.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Grammaticality, Acceptability, and Probability: A Probabilistic View of Linguistic Knowledge

The question of whether humans represent grammatical knowledge as a binary condition on membership in a set of well-formed sentences, or as a probabilistic property has been the subject of debate among linguists, psychologists, and cognitive scientists for many decades. Acceptability judgments present a serious problem for both classical binary and probabilistic theories of grammaticality. Thes...

متن کامل

Acceptability Prediction by Means of Grammaticality Quantification

We propose in this paper a method for quantifying sentence grammaticality. The approach based on Property Grammars, a constraint-based syntactic formalism, makes it possible to evaluate a grammaticality index for any kind of sentence, including ill-formed ones. We compare on a sample of sentences the grammaticality indices obtained from PG formalism and the acceptability judgements measured by ...

متن کامل

Must diphone synthesis be so unnatural?

An English utterance was synthesized in four versions using sets of diphones produced under four different prosodic and contextual conditions. The synthesis used either accented diphones only or appropriately located accented and unaccented diphones, with each of these conditions being repeated using neutral-context and differentiated-context diphones. They were presented to two listener groups...

متن کامل

Alert correlation and prediction using data mining and HMM

Intrusion Detection Systems (IDSs) are security tools widely used in computer networks. While they seem to be promising technologies, they pose some serious drawbacks: When utilized in large and high traffic networks, IDSs generate high volumes of low-level alerts which are hardly manageable. Accordingly, there emerged a recent track of security research, focused on alert correlation, which ext...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015